Adding median statistics #106

nicoborghi · 2023-10-19T14:26:38Z

The median statistic is commonly used in fields like cosmology to report central values and confidence intervals. It sets the central point to the median of the posterior distribution, and determines the upper and lower bounds from the percentiles (e.g. 16th and 84th percentiles for a "1σ" interval).

This is a quick implementation that makes use of the summary_area attribute to compute the (symmetric) percentiles.

Samreay · 2023-10-19T23:52:27Z

Hi @nicoborghi, thanks for the PR! I just want to make sure I'm not missing something obvious, but isn't this the same summary statistic as SummaryStatistic.CUMULATIVE - just without any histogram or smoothing going into it?

I also don't believe we can just use np.percentile, because this doesn't cater to samples possibly having different weights

nicoborghi · 2023-10-20T08:50:13Z

Oh yes, you are right, I did not consider the problem of weights!

I really like CC, including the LaTex table feature and in this case I prefer to quote the parameters as median and 16-84th percentiles to be less sensitive to outlies without manually tune the smoothing, binning, or KDE parameters. However when preparing the code below I realized that in these extreme cases the plot would differ significantly from the summary statistic.

Maybe it would be helpful to have it just for the table or perform a smoothing after removing the outliers?

Thanks for your time!

arr = np.hstack([np.random.normal(0,1,1000), np.random.normal(1000,1,100)])

c = ChainConsumer()
c.set_plot_config(PlotConfig(show_legend=True))
c.add_chain(Chain(samples=pd.DataFrame(arr, columns = ["x"]), name = "Median",
                  statistics="median"))
        
d = ChainConsumer()
d.set_plot_config(PlotConfig(show_legend=True))
d.add_chain(Chain(samples=pd.DataFrame(arr, columns = ["x"]), name = "Cumulative",
                  statistics="cumulative"))
        
e = ChainConsumer()
e.set_plot_config(PlotConfig(show_legend=True))
e.add_chain(Chain(samples=pd.DataFrame(arr, columns = ["x"]), name = "Cumulative, smooth=0", 
                  statistics="cumulative",smooth=0))

fig1 = c.plotter.plot(figsize=(5,2))
fig2 = d.plotter.plot(figsize=(5,2))
fig3 = e.plotter.plot(figsize=(5,2))

Samreay · 2023-10-21T06:59:02Z

Hmm, my main concern in adding this in is that the two methods should converge, except for when you have issues in your chain as you showed in your example. The two methods being different given bad inputs isn't a worry to me.

You could also recover results that are always within an arbitrary number of significant digits if you just use cumulative and ramp the number of bins up while setting smooth=0, but this isn't very user friendly, I'd agree.

I'm still happy to add this, so long as we can get it working for weighted samples, and I'd also recommend using np.quantile instead of np.percentile(100 * list_of_quantiles)

Adding median statistics

b6810ee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding median statistics #106

Adding median statistics #106

nicoborghi commented Oct 19, 2023 •

edited

Loading

Samreay commented Oct 19, 2023

nicoborghi commented Oct 20, 2023 •

edited

Loading

Samreay commented Oct 21, 2023

Adding median statistics #106

Are you sure you want to change the base?

Adding median statistics #106

Conversation

nicoborghi commented Oct 19, 2023 • edited Loading

Samreay commented Oct 19, 2023

nicoborghi commented Oct 20, 2023 • edited Loading

Samreay commented Oct 21, 2023

nicoborghi commented Oct 19, 2023 •

edited

Loading

nicoborghi commented Oct 20, 2023 •

edited

Loading